Unsupervised distance metric learning using predictability

نویسندگان

Abhishek A. Gupta

Dean P. Foster

Lyle H. Ungar

چکیده

Distance-based learning methods, like clustering and SVMs, are dependent on good distance metrics. This paper does unsupervised metric learning in the context of clustering. We seek transformations of data which give clean and well separated clusters where clean clusters are those for which membership can be accurately predicted. The transformation (hence distance metric) is obtained by minimizing the blur ratio, which is defined as the ratio of the within cluster variance divided by the total data variance in the transformed space. For minimization we propose an iterative procedure, Clustering Predictions of Cluster Membership (CPCM). CPCM alternately (a) predicts cluster memberships (e.g., using linear regression) and (b) clusters these predictions (e.g., using k-means). With linear regression and k-means, this algorithm is guaranteed to converge to a fixed point. The resulting clusters are invariant to linear transformations of original features, and tend to eliminate noise features by driving their weights to zero. Comments University of Pennsylvania Department of Computer and Information Science Technical Report No. MSCIS-08-23. This technical report is available at ScholarlyCommons: http://repository.upenn.edu/cis_reports/885 Unsupervised distance metric learning using predictability Abhishek A. Gupta Department of Statistics University of Pennsylvania [email protected] Dean P. Foster Department of Statistics University of Pennsylvania [email protected] Lyle H. Ungar Department of Computer and Information Science University of Pennsylvania [email protected]

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

یادگیری نیمه نظارتی کرنل مرکب با استفاده از تکنیک‌های یادگیری معیار فاصله

Distance metric has a key role in many machine learning and computer vision algorithms so that choosing an appropriate distance metric has a direct effect on the performance of such algorithms. Recently, distance metric learning using labeled data or other available supervisory information has become a very active research area in machine learning applications. Studies in this area have shown t...

متن کامل

Some Research Problems in Metric Learning and Manifold Learning

In the past few years, metric learning, semi-supervised learning, and manifold learning methods have aroused a great deal of interest in the machine learning community. Many machine learning and pattern recognition algorithms rely on a distance metric. Instead of choosing the metric manually, a promising approach is to learn the metric from data automatically. Besides some early work on metric ...

متن کامل

Distance Metric Learning: A Comprehensive Survey

Many machine learning algorithms, such as K Nearest Neighbor (KNN), heavily rely on the distance metric for the input data patterns. Distance Metric learning is to learn a distance metric for the input space of data from a given collection of pair of similar/dissimilar points that preserves the distance relation among the training data. In recent years, many studies have demonstrated, both empi...

متن کامل

Metric learning for unsupervised phoneme segmentation

Unsupervised phoneme segmentation aims at dividing a speech stream into phonemes without using any prior knowledge of linguistic contents and acoustic models. In [1], we formulated this problem into an optimization framework, and developed an objective function, summation of squared error (SSE) based on the Euclidean distance of cepstral features. However, it is unknown whether or not Euclidean...

متن کامل

An Overview of Distance Metric Learning

In our previous comprehensive survey [41], we have categorized the disparate issues in distance metric learning. Within each of the four categories, we have summarized existing work, disclosed their essential connections, strengths and weaknesses. The first category is supervised distance metric learning, which contains supervised global distance metric learning, local adaptive supervised dista...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2007

Unsupervised distance metric learning using predictability

نویسندگان

چکیده

منابع مشابه

یادگیری نیمه نظارتی کرنل مرکب با استفاده از تکنیک‌های یادگیری معیار فاصله

Some Research Problems in Metric Learning and Manifold Learning

Distance Metric Learning: A Comprehensive Survey

Metric learning for unsupervised phoneme segmentation

An Overview of Distance Metric Learning

عنوان ژورنال:

اشتراک گذاری